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Abstract. In the context of Markov decision processes running in con¬ 
tinuous time, one of the most intriguing challenges is the efficient approx¬ 
imation of finite horizon reachability objectives. A multitude of sophisti¬ 
cated model checking algorithms have been proposed for this. However, 
no proper benchmarking has been performed thus far. 

This paper presents a novel and yet simple solution: an algorithm, orig¬ 
inally developed for a restricted subclass of models and a subclass of 
schedulers, can be twisted so as to become competitive with the more 
sophisticated algorithms in full generality. As the second main contribu¬ 
tion, we perform a comparative evaluation of the core algorithmic con¬ 
cepts on an extensive set of benchmarks varying over all key parameters: 
model size, amount of non-determinism, time horizon, and precision. 


1 Introduction 

Over the last two decades, a formal approach to quantitative performance and 
dependability evaluation of concurrent systems has gained maturity. At its root 
are continuous-time Markov chains (CTMC) for which efficient and quantifiably 
precise solution methods exist [5] . A CTMC can be viewed as a labelled transition 
system (LTS) whose transitions are delayed according to exponential distribu¬ 
tions. CTMCs are stochastic processes and thus do not support non-determinism. 
Non-determinism, often present in classical concurrency and automata theory 
models, is useful for modelling uncertainty or for performing optimisation over 
multiple choices. The genuine extension of CTMCs with non-determinism are 
continuous time Markov decision processes (CTMDPs). The non-determinism is 
controlled by an object called scheduler (also policy or strategy). 

Prominent applications of CTMDPs include power management and schedul¬ 
ing m, networked, distributed systems mm, epidemic and population pro¬ 
cesses [20] 5 economy |S] and others. Moreover, CTMDPs are the core semantic 
model |8] underlying formalisms such as generalised stochastic Petri nets [2T] . 
Markovian stochastic activity networks [22] and interactive Markov chains m- 

When model checking a CTMDP [6], one asks whether the behaviour of the 
model for some schedulers (if we control the non-determinism) or for all sched¬ 
ulers (if it is out of control) satisfies given performance or dependability criteria. 
A large variety of them can be expressed using logics such as CSL [Tj. At the 
centre of model-checking problems for such criteria is time bounded reachability: 
What is the maximal/minimal probability to reach a given set of states within a 




given time bound. Having an efficient approach for this optimisation (maximisa¬ 
tion or minimisation) is crucial for successful large-scale applications. 

In order to not discriminate against real situations, one usually assumes 
that the scheduler can base its decisions on any available information about the 
past. Restricting the information however tends to imply cheaper approximative 
algorithms mm- For CTMDPs, we can distinguish (general) timed optimal 
scheduling and (restricted) untimed optimal scheduling |8l4j . In the latter case, 
the scheduler has no possibility, intuitively speaking, to look at a clock measuring 
time. Another distinction within timed optimality discussed in the literature 
is early optimal scheduling (where every decision is frozen in between state 
changes [26115] ) and late optimal scheduling (where every decision can change 
as time passes while residing in a state HE])- 

A handful of sophisticated algorithms have been suggested for timed opti¬ 
mality (partly for early optimality, partly for late optimality) signifying both 
the importance and the difficulty of this problem mm- This paper presents 
a substantially different algorithm addressing this very problem. The approach 
is readily applicable to both early and late optimality. It harvest a very efficient 
algorithm for untimed optimality [5| originally restricted to a subclass of models. 
By a simple twist, we make it applicable for the general timed optimality for 
arbitrary models. As a second contribution, we present an exhaustive empirical 
comparison of this novel algorithm with all other published algorithms for the 
(early or late) timed optimality problem. We do so on an extensive collection 
of scalable industrial and academic CTMDP benchmarks (that we also make 
available). Notably, all earlier evaluations did compare at most two algorithms 
on at most one or two principal cases. We instead cross-compare 5 algorithms 
on 7 application cases, yielding a total of about 2350 distinct configurations. 
The results demonstrate that our simple algorithm is highly efficient across the 
entire spectrum of models, except for some of the experiments where extreme 
precision is required. On the other hand, no algorithm is consistently dominating 
any other algorithm across the experiments performed. 

Related work. Timed optimal scheduling has been considered for many decades 
both theoretically FTiF?)] and practically by introducing approximative algo¬ 
rithms. Formal error bounds needed for verification have been studied only re¬ 
cently |26n5lhl7j . Fragmentary empirical evaluations of some of the published 
algorithms have been performed mm- In a nutshell, the published knowledge 
boils down to m\ ^ijs] ffS and |5S] <gj m <[i] 0. where a <[.] b denotes “5 
is shown empirically faster than a in [•]”. A substantial cross-comparison of the 
newest three algorithms is however lacking. 

Contribution of the paper. The paper {%) develops a novel and simple approxima¬ 
tion method for time bounded CTMDP reachability, {ii) presents the first ever 
set of benchmarks for CTMDP model checking, and (iii) performs an empirical 
evaluation across benchmarks and algorithms. The evaluation suggests that the 
optimal timing of decisions for time bounded reachability can be solved effec¬ 
tively by a rather straightforward algorithm, unless extreme precision is needed. 
















2 Preliminaries 


Definition 1. A continuous-time Markov decision process (CTMDP) is a tuple 
C = {S, Act, R) where S is a finite set of states, Act is a finite set of actions, 
and R : S' X Act x S —>■ K>o is a rate function. 

We call an action a enabled in s, also denoted by a G Act(s), if R(s, a, s') > 0 
for some s' € S. We require that all sets Act{s) are non-empty. A continuous¬ 
time Markov chain (CTMC) is a CTMDP where all Act{s) are singleton sets. 

For a given state s and action a G Act{s), we denote by E{s, a) = 
the exit rate of a in s. Finally, we let P(s,a, s') := R(s, a, s')/if(s, a). 

The operational behaviour of a CTMDP is like in a CTMC. Namely, when 
performing a given action oq in a state sq, the CTMDP waits for a transition, i.e. 
waits for a delay to chosen randomly according to an exponential distribution 
with rate E(so,ao). The transition leads to a state Si again chosen randomly 
according to the probability distribution P(so, ao, •). When performing an action 
oi there, it similarly waits for time ti and makes a transition into a state S 2 and 
so on, forming an infinite run sotositi ■ ■ ■. 

The difference to a CTMC lies in the need to choose actions to perform, 
done by a scheduler. There are two classes of schedulers, early and late. Whenever 
entering a state, an early scheduler needs to choose and commit to a next action, 
whereas late schedulers may change such choices at any time later while residing 
in the state. In this paper we restrict w.l.o.g. [23] to deterministic schedulers but 
we allow the decision to depend on the whole history soto ■ • • so far. 

Definition 2. A (timed late) randomised scheduler is a measurabl^function a 
that to any history h = soto • ■ ■ tn-iSn and time t > 0 spent in Sn so far assigns 
a distribution over enabled actions Act(sn). We call a early if a{h,t) = a{h,t') 
for all h,t,t'; and deterministic if a{h,t) assign 1 to some action a for all h,t. 

We denote the set of all (timed) late or early schedulers by Time and Time, 
respectively. We use these subscripts V G {£,e} throughout the paper to dis¬ 
tinguish between the late and the early setting. Furthermore, a scheduler a is 
called untimed if a{h,t) = a{h',t') whenever h and h' contain the same se¬ 
quence of states. By Unt we denote the set of all untimed schedulers. Note that 
Unt C Time C Time. 

Fixing a scheduler a and an initial state s in a CTMDP C, we obtain the 
unique probability measure Pr^’® over the space of all runs by standard defini¬ 
tions |2l], denoted also by Pr^ when C is clear from context. 

Problem 1 (Maximum Time-Bounded Reachability) LetC = (S', Act, R), 
G C S be a set of goal states, T G M>o a time bound, and V G {£,e}. Approxi¬ 
mate the values valj G [0,1]"®, where each valj(s) maximises the probability 

valj (s) := sup Pr® [O-^G] 

crG Tim V 

of runs O-^G = {sq^o • • • | 3* : Si G G A ^ reaching G before T. 


^ Measurable with respect to the standard c-algebra on the set of finite histories [24| . 



Whenever C is clear from context, we write val^. We call cr S Timy e-optimal 
if Pr® [O-^G] > val^(s) — e for all s G S, and optimal if it is 0-optimal. 

By minor changes, all results of the paper also address the dual problem of 
minimum time bounded reachability that we omit to simplify the presentation. 

Remark 1. There exists a value preserving encoding of early scheduling into late 
scheduling in CTMDPs [55] ■ It has exponential space complexity (due to the 
number of induced transitions). This exponentiality does arise in practice, e.g. 
for the stochastic job scheduling problem considered later. Therefore we treat 
the two algorithmic settings separately. Early scheduling is natural for models 
derived from generalised stochastic Petri nets or interactive Markov chains. 

3 Unif+: Optimal Time-Bounded Reachability Revisited 

In this section, we develop a novel and simple algorithm for Problem We fix 
C = (5', Act, R), G C S', T G K>o, V G {(., e} and an approximation error e > 0. 
Furthermore, let E^ax '■= maxs ^ ^(s, a) denote the maximal exit rate in C. 

In contrast to existing methods, our approach does not involve discretisa¬ 
tion. The algorithm instead builds upon uniformisation US] and untimed analy¬ 
sis |3|4l29]. It is outlined in Algorithm Technically, it is based on an iterative 
computation of tighter and tighter lower and upper bounds on the values until 
the required precision is met. In the hrst iteration, a uniformisation rate A is 
set to Emax, in every further iteration its value is doubled. In every iteration, 
we compute a lower bound yal and an upper bound toI by two types of untimed 
analyses on the CTMDP obtained by uniformising C to the rate A. In the 
remainder of this section, we explain the individual steps of Algorithm [l] and 
prove correctness and termination. 

Informally, the lower bound is based on maximum time bounded reachability 
with respect to the untimed scheduler subclass |5]. The upper bound, similarly to 


Algorithm 1: Unif+ 

input : CTMDP C = (S, Act, R), goal states G G S, horizon T G K>o, 
scheduler class V G {£, e}, and approximation error e > 0 
params: truncation error ratio k G (0, 1) 
output : vector v such that ||v — val^||oo < e and A 

1 A -f- maximal exit rate Emax in C 

2 repeat 

3 I Cj V-uniformisation of C to the rate A 

4 y •<— approximation of the lower bound yal for Cj up to error s ■ k 

5 V <— approximation of the upper bound val for up to error e ■ k 

6 I A ^ 2 • A 

7 until IIV — y||oo <£•(! — «:) 

8 return y, A 







the one in [7], is based on prophetic untimed schedulers that yield higher value 
than timed schedulers by knowing in advance how many steps will be taken 
within time T. The intuition is that an untimed scheduler can approximately 
observe the elapse of time by knowing the count of steps taken and the ex¬ 
pected delay per every step. In uniformised models, these delay expectations are 
identical across all states (forming a Poisson process) and therefore allow easy 
access to the expected total elapsed time. By uniformising the model with higher 
and higher uniformisation rates, this implicit knowledge of untimed schedulers 
increases. On the other hand, the knowledge of prophetic untimed schedulers 
decreases; both approaching the power of timed schedulers. 


3.1 Uniformisation to 

CTMDP C may have transitions with very different rates across different states 
and actions. Here, we discuss how to perform uniformisation for such a model. 
This is a conceptually well-known idea [18j . Applying it to C intuitively makes 
transitions occur with a higher rate A > Emax, uniformly across all states and 
actions. 

To ensure that uniformisation does not change the schedulable behaviour, we 
need distinct uniformisation procedures for the early and the late setting. Late 
uniformisation is straightforward, it adds self-loops to states and actions where 
needed. 

Definition 3 (Late uniformisation). For X > Emax we define the late uni¬ 
formisation of C to rate X as a CTMDP = (5, Act, R^) where 


Ri(s,a,s') 


R(s, a, s') if s s', 

X— R(s, a, s") if s = s'. 

s"^s 


Example 1. For the fragmentary CTMDP C depicted below on the left, its late 
uniformisation to rate 4.5 is depicted in the middle. 



Using the same transformation for the early setting would give the scheduler the 
spurious possibility to “reconsider” the choice of the action in a state whenever a 
newly added self-loop is taken. To exclude that possibility, early uniformisation 
introduces a copy state (s,a) for each state s and action a so as to “freeze” 
the commitment of choosing action a until the next state change occurs. The 
construction is shown on the right. States of the form (s,T) correspond to the 
original states, i.e. those where no action has been committed to yet. 



Definition 4 (Early uniformisation). For A > Emax, the early uniformisa- 
tion of C to rate X is a CTMDP = (S x ({_L} U Act), Act, R^) where for every 
state (s, •), action a G Act, and every successor state (s', o) we have 

rR(s,a, s') */o = _L, 

R-A((s>-),a, (s',o)) := / A - E{s,a) ifo = a,s = s', 

[O elsewhere. 

Uniformisation preserves the value of time-bounded reachability for both early [24| 
and late schedulers [23]. 

Lemma 1. VA > E^ax- valj = valjv, i.e. uniformisation preserves the value. 

As a result, we can proceed by bounding the values of Cj for large enough A 
instead of bounding the values of the original CTMDP C. 


3.2 Lower and upper bounds on the value of 

We now fix a A and consider a uniform CTMDP Cj. We denote by ()z[G the 
subset of runs ()-^G reaching the target where exactly i steps are taken up to 
time T. With this, we define the bounds by ranging over Unt schedulers in Cj: 


val(s) := sup Pr 


(tG Unt 


OzjG 


oo 


val(s) > sup Pr^ 




2=0 


crG Unt 


OziG 


Since all 0=^ G are disjoint and ()-^ G = UieNo value yal is the opti¬ 

mal reachability probability of standard untimed schedulers on the uniformised 
model. It will serve as a lower bound on the values val^. The value wtl, on 
the other hand, which has the supremum and summation swapped, does not 
correspond to the value of any realistic scheduler. Intuitively, it is the value of 
a prophetic untimed scheduler, which for each particular run knows how many 
steps will be taken (as for every i, a different standard scheduler cr may be used). 
This knowledge makes the scheduler more powerful than any other timed one: 

Lemma 2. It holds thatvaF < val^, and for any CTMDP Cj, yal < val^ < wrl. 


Approximating the bounds. Since yal and tal are defined via infinite sum¬ 
mations, we need to approximate these bounds. We do so by iterative algorithms 
truncating the sums. This is what is computed in line 4 and 5 of Algorithm 
Each truncation induces an error of up to e • n. 

Let if\{k) denote the Poisson distribution with parameter XT at point k, 
i.e. the probability that exactly k transitions are taken in the CTMDP Cj before 
time T. Furthermore, let N = [ATe^ — ln(£ • k)] , where e is the Euler’s number. 
We recursively define for every 0 < k < N and every state s, functions 

fO itk = N, 

Yfe(s) = < J2f=~k V’A(f) if fc < A and s e G, 

[maxa J2s' Pa ’ Yfc+i(s') if fc < A and s ^ G, 






fo if A: = TV, 

Wfc(s) = s 1 if k < N and s € G, 

[ maxa J2s' Pa («, a, s') ’ Wfe+i (s') if k < N and s ^ G, 

N-l 

Vfc(s) = ^ • w(7v-i)-(i-fe)(s), 

i—k 

where denotes the transition probability matrix of Cj. 

Lemma 3. In any CTMDP Cj, ||yg — val||oo < e ■ k and ||vo — val||oo < e ■ k. 

We compute Vq as in the untimed analysis of uniform models j3], which in 
turn agrees with the standard “uniformisation” algorithm for CTMCs when the 
maximisation is dropped. The computation of w^ is analogous to step-bounded 
reachability for discrete-time Markov decision processes, where the reachability 
probabilities for different step-bounds are weighted by the Poisson distribution 
in the end in vq- Both vectors can be computed in time 0{N ■ • |Act|). 

Numerieal Aspects. In practice also Vg and vq can only be approximated due to 
presence of 'il)\{k). For details how the overall error bound is met in an analogous 
setting, see [3]. For high values of A and thus also TV, the Poisson values '4’\{k) 
are low for most 0 < k < N and also the values in get close to 1 when on 
the diagonal and to 0 when off-diagonal. Where high precision is required and 
thus high A may be needed, attention has to be paid to numerical stability. 

3.3 Convergence of the bounds for increasing A 

An essential part for the correctness of Algorithm is its convergence: 

Lemma 4. We have lim g\—>■ 0 where g\ denotes the gap ||yM — wlHoo inCj. 

A—voo 

Proof Idea. We here provide an intuition of the core of the proof, namely why uni¬ 
formisation with higher A increases the power of 
untimed schedulers and decreases the power of 
prophetic ones: The count of transitions taken so 
far gives untimed schedulers approximate knowl¬ 
edge of how much time has elapsed. In situa¬ 
tions with the same expeetation of elapsed time, 
a higher uniformisation rate induces a lower vari¬ 
ance of elapsed time. On the right, we illustrate 
comparable situations for different uniformisation 
rates, after 5 transitions with rate 0.5 and after 100 transitions with rate 10. Both 
depicted cumulative distribution functions of elapsed time have expectation 10 
but the latter is way steeper, providing a more precise knowledge of time. 

At the same time prophetic schedulers on the high-rate uniformised model 
are less powerful than on the original one. When taking decisions, the future evo¬ 
lution is influenced by two types of randomness: (a) continuous timing, i.e. how 






many further transitions will be taken before the time horizon and (b) discrete 
branching, i.e. which transitions will be taken. Even though the value stays the 
same for arbitrary A, the “source of” randomness for high A shifts from (a) to 
(b). Namely, the distribution of the number of future transitions also becomes 
steeper for higher A, thus being “less random” by having smaller coefficient of 
variation. At the same time, the discrete branching for higher A influences more 
the number of actual transitions taken (i.e. transitions that are not the added 
self-loops). As a result, the advantage of the prophetic scheduler is only little as 

(i) it boils down to observing the outcome of a less and less random choice and 

(ii) the observed quantity has little impact on how many actual transitions are 
taken. 

As a result of Lemmawe obtain that Algorithmterminates. Its correct¬ 
ness follows from Lemma Bi and 1^ all summarized by the following theorem. 

Theorem 1. Algorithm^ computes an approximation o/val^ up to errors. 

Remark 2. Algorithmdetermines a sufficiently large A in an exponential search 
fashion. In practice, this approach is efficient w.r.t. the total number I of itera¬ 
tions needed, i.e. the total number of times Y/. and w^ are computed from y_^j^i 
and Wfe+i. Namely, in practice the error monotonously decreases when the rate 
increases (not in theory but we never encountered the opposite case on our ex¬ 
tensive experiments.) As a result, A found by Algorithm satisfies A < 2 • A* 
where A* is the minimal sufficiently large rate. As the number of iterations 
needed for one approximation is linear in the uniformisation rate used, we have 
I = 2/a < 4 • /a* , where each ly denotes the number of iterations needed for 
the computation for the fixed rate Ah 


3.4 Extracting the scheduler 

By computing the lower bound, Algorithmj^also produces [3] an untimed sched¬ 
uler that is £-optimal on the uniformised model C J. In the original CTMDP 
C, we cannot use aj directly as its choices are tailored to the high rate A. We can 
however use a stochastic update scheduler attaining the same value. Informally, 
a (timed) stochastie update scheduler a = (At,(Tu,7ro) operates over a countable 
set M. of memory elements where the initial memory value is chosen randomly 
according to the distribution ttq over A4. The stochastic update function (t„, 
given the current memory element, state, and the time spent there, defines a 
distribution specifying the action to take and how to update the memory. In¬ 
tuitively, the stochastic update is used for simulating the high-rate transitions 
that would be taken in ; their total count so far is stored in the memory. For 
a formal definition of stochastic update and the construction, see the Appendix. 

Lemma 5. The values (yj,)o<fc<Ar computed by Algorithm^Jor given C, V, and 
e > 0 yield a stochastic update scheduler'a^jj that is e-optimal in C. 


4 Existing Algorithms 


This section briefly reviews the various published algorithms solving Problem 
In contrast to Algorithm (called Unif+ or u+ for short), they all discretise 
time into a finite number of time points ... ,tn where to = 0 and = T. 

They iteratively approximate the values val^(s;fi) := sup^gj^j^^ Pr® [<>-*’G] 
when ti time units remain at state s. Three different iteration concepts have been 
proposed, each approximating val^(s;<i+i) from approximations of val^(s';fi). 

Exponential approximation - early Assuming equidistant points ti one 

can approximate the (early) value function by piece-wise exponential functions. 
A fc-order approximation considers only runs where at most k steps are taken 
between any two time points. This can yield an a priori error bound. The higher 
k, the less time points are required for a given precision, but the more compu¬ 
tation is needed per time point. We refer to these algorithms by ExpSxEP-fc or 
ES-fc for short. Only ES-1 [55] and es-2 [T5] have been implemented so far. 

Polynomial approximation - late m- Another way to approximate the (late) 
value function on equidistant time points uses polynomials. As before, the higher 
the degree of the polynomials, the higher is the computational effort, but the 
number of discretised time points required to assure an a priori error bound 
decreases. We call these algorithms PolyStep-A: or PS-fc in the sequel, only PS-1, 
PS-2, and PS-3 have been implemented. Among these, PS-2 has better worst-case 
behaviour, but PS-3 has been reported to often perform better in practice. 

Adaptive discretisation - late This approach is not based on an a priori error 
bound but instead computes both under- and over-approximations of the values 
val^(s;fi). This allows one to lay out the time points adaptively. Depending on 
the shape of the value function, the time step can be prolonged until the error 
allowed for this step is reached. This greatly reduces the number of time points, 
relative to the worst case. We refer to this algorithm as AdaptStep or AS. 

5 Empirical Evaluation and Comparison 

In this section we present an exhaustive empirical comparison of the different 
algorithmic approaches discussed. 

Benchmarks. The experiments are performed on a diverse collection of pub¬ 
lished benchmark models. This collection is the first of its kind for CTMDP, as 
far as we know and contains the following parametrised models: 

PS-K-J The Polling System case PIM] consists of two stations and one server. 
Incoming requests of J types are buffered in two queues of size K each, until 
they are processed by the server and delivered to their station. We consider 
the undesirable states with both queues being full to form the goal state set. 






QS -K-J The Queuing System [14] stores requests of J different types into two 
queues of size K. Each queue is attached to a server. Two servers fetch 
requests from their corresponding queues and process them. One of them 
can non-deterministically decide to insert a request after processing into the 
other server’s queue. Goal states are again those with both queues full. 
DPMS-if- J The Dynamic Power Management System |27| is a CTMDP model 
of the internals of a Fujitsu disk drive. The model consists of four compo¬ 
nents: service requester (SR), service queue (SQ), service provider (SP), and 
power manager (PM). SR generates tasks of J types differing in energy de¬ 
mand that are buffered by the queue SQ of size K. Afterwards they are 
delivered to SP to be processed. SP can work in different modes ranging 
from sleep and stand-by to full processing mode, selected by PM. We define 
a state as goal if the queue of at least one task type is full. 

GFS-A^ The Google File System [iniii] splits files into chunks of equal size, each 
chunk is maintained by one of N chunk servers. We fix the number of chunks 
a server may store to 5 000 and the total number of chunks to 100 000. While 
other benchmarks start in optimal conditions, the GFS starts in the broken 
state where no chunk is stored. A state is defined as goal if the system is 
back up and for each chunk at least one copy is available. 

FTWC-A^ The Fault Tolerant Workstation Cluster [El, originally described 
by a GSPN, models two networks of N workstations each, interconnected 
by a switch. The two switches communicate via a backbone. Workstations, 
switches, and the backbone fail after exponentially distributed delays, and 
can be repaired only one at a time. We define a state as goal if in total less 
than N workstations are operational and connected to each other. 
S3S-M-J The stochastic job scheduling |S] models a multiprocessor architecture 
running a sequence of independent jobs. It consists of M identical processors 
and J jobs, where each job’s service time is governed by an exponential 
distribution. As goal we define the desirable states with all jobs completed. 
ES-A'-i? The Erlang Stages is a synthetic model with known characteristics |31j . 
It has two different paths to reach the goal state: a fast but risky path or a 
slow but sure path. The slow path is an Erlang chain of length K and rate R. 
Implementation aspects. Unbiased performance evaluation of algorithms 
originally developed by different researchers is not easy even with all original im¬ 
plementations at hand. Namely, they may use different programming languages 
or rely on different platforms with incomparable performance and memory man¬ 
agement. However, reimplementing a published algorithm may induce unfairness 
as the original implementation may use specific data structures or other optimi¬ 
sations that go beyond what is explained in the respective publication. 

We adapted/implemented all algorithms in G/G-b-b, trying to avoid the 
shortcomings. We used a common infrastructure from the IMGA/MAMA 
toolset [12]. Thus, we could directly use the original IMCA implementations of 
ExpStep-1 and of ExpStep-2 [TS]. The original implementation [B] of Adapt- 
Step in MRMC [TB] needed only minor adaptations, as MRMC uses a data 
structure identical to ours. Finally, for PolyStep, we closely followed the orig¬ 
inal Java code [9]. Our C version clearly outperforms the original Java version. 





Fig. 1: Selected experiments: Increasing state space size. 


We implemented all algorithms with standard double precision arithmetic, 
observing no issues with numerical stability in our experiments. All values com¬ 
puted by different algorithms lie within the expected precision from each other. 

We used parameter values = 10 and uj = 0.1 for AS, as recommended. 
We always ran both adaptive and non-adaptive variant of AS and display the 
better results (mostly adaptive). Based on our tests, we fixed k := 0.1 for u+. 

Empirical Results. In this section we present our empirical observations. We 
consider early and late scheduling problems separately (because the encoding 
mentioned in Remark]^ of Section]^ is exponential); only Unif+ can be directly 
run on both problems. All experiments were run on a single core of Intel Core 
17-4790 with 16GB of RAM, computing a total of about 2350 data points. 

The memory requirements of all the considered algorithms do not deviate 
considerably and thus are not reported. This echoes that all space complexities 
are linear in the model size. We encountered no significant impact of additional 
dependencies of PolyStep on a hidden model parameter (number of “switching 
points”, coarsely bounded in i)- 

In the following, we focus on the time requirements. We first show plots of 
a few selected experiments that represent well our general observations. Later, 
we give a short summary of all experiments. All plots presented below use log¬ 
arithmic scale for the runtime (in seconds). Some data points are missing as we 
applied a time limit of 15 minutes for every computation and also because the 
original implementation of ExpStep-2 cannot handle models with more than 
two actions per state. We use symbol to denote the maximal number of action 
choices and A for the maximal exit rate. We use the symbol “x” whenever the 
varying parameter is a part of the model name, e.g. PS-2-x. 

State space. In Figure we illustrate the effect of enlarging the state space. 
On the left there is a plot for early algorithms representing the general trend: 
Unif+ outperforms ExpStep-1 (as well as ExpStep-2 where applicable). 
For late algorithms in the plots on the right, the situation is more diverse, 
with Unif+ and AdaptStep outperforming the PolyStep algorithms. All 
algorithms exhibit similar dependency on the growth of the state space. 









PS-l-x, early, |S| 6 [17,1445], SJS-2-x, late, |S| 6 [23,341 389], QS-2-x. late, |S| 6 [796,721 838], 
A 6 [2.8,128.8], T = 5, e = 10“'“ A = [3,32], T = 1, e = 10“® A 6 [11.3,44.9], T = 1, e = 10“'‘ 

Fig. 2: Selected experiments: Increasing number of action choices. 


Action choices. Figure displays the effect of increasing the number of ac¬ 
tions to choose from. For early schedulers (left) Unif+ generally dominates 
ExpStep-1. For late schedulers, again Unif+ and AdaptStep dominate 
PolyStep. Increasing the choice options in our models generally induces 
larger state spaces, so the observed growth is not to be attributed to the 
computational difficulty resulting from an increase in choice options alone. 

Precision. Figure details precision dependency. Across all models, Unip+ 
works very well, excepts for some high precision cases, such as the DPMS 
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PS-1-1, early, A= 2, 

S| = 17, A = 2.8, T = 5 


SJS-3-5, late, 60, 

S| = 8851, A = 21. T = 1 


ES-1000-10, late, X= 2, 
jS] = 10 004, A = 10, T = 20 




DPMS-2-2, early, .A, = 2, 
\S\ = 71, A = 2.1, T = 10 


DPMS-2-2, late, A= 2, 
S| = 71, A = 2.1, T = 10 


GFS-40, late, A= 2, |5| = 
9808, A = 492, T = 4 


Fig. 3: Selected experiments: Increasing precision. 






















ES-1000-10, early, A= 2, 
|S| = 1004, A = 10, e = 10“ 


ES-1000-10, late, A= 2, 
|S| = 1004, A = 10, e = 10“ 


PS-1-1, early, A= 2, |S| = 
17, A = 2.8, 6 = 10“'‘ 



PS-4-2, late, A = 4, |S| = QS-2-2, late, A= 8, IS] = FTWC-16, late, A = 5, |51 = 

10 593, A = 5.6, 6 = 10“'* 796, A = 11.3, e = 10“'' 10 130, A = 2.06, e = 10“" 


Fig. 4: Selected experiments: Increasing time bound. 


models, where ExpStep-2 might be preferable over Unif+ in the early set¬ 
ting (bottom left), and similarly for AdaptStep in the late setting (bottom 
middle). The same is true for the GFS case (bottom right). On the other 
hand, for some models (examples in the first row) Unif“'" delivers very high 
precision without any runtime increase. It is also interesting that generally 
the sensitivity of all algorithm to required precision is more than linear in 
the number of precision digits. 

Time bound. Figurej^illustrates the effect of increasing the time bound. Again, 
the UNiF^-algorithm is the least sensitive in the early setting. For late 
scheduling, there are some notable QS instances where PolyStep- 3 out¬ 
performs both AdaptStep and Unif+ (bottom middle). Very large time 
bounds make sense only for a few models (bottom right, log-log-scale). Else¬ 
where, the values converge making it trivial for AS and U^. 

Among the many instances we considered we found a few 
instances where the late UNiF+-algorithm shows surpris¬ 
ing sensitivity to changes in time bound, particularly for 
high precision scenarios. This is exemplified on the right 
(GFS, late, A.= 2, \S\ = 9808, A = 492, e = 10"®, increas¬ 
ing time bound, no log scale). In line with the apparent 
general tendency of the algorithms for increasing param¬ 
eter values, the work and thus time needed tends to increase monotonously. 






























max. |S| 

max 

A 

max. exit 
rate range 

best in early 
(# of cases) 

best in late 
(# of cases) 

PS 

743 969 

7 

5.6-129.6 

U+ (32) 

u+ (47) 

QS 

16 924 

36 

6.5-44.9 

U+ (32) 

PS-3(18), u+ (17), as (15) 

DPMS 

366 148 

7 

2.1-9.1 

u+ (31), es-2(3), n/a(1) 

AS (24), u+ (14), PS-3(6) 

gfs 

15 258 

2 

252 - 612 

U+ (40) 

AS (23), u+ (11) 

ftwc 

2 373 650 

5 

2-3.02 

U+ (25) 

U+ (32) 

SJS 

18 451 

72 

3-32 

U+ (57), es-2(2) 

U+ (70), AS (29) 

ES 

30 004 

2 

10 

U+ (23), es-2(4), n/a(1) 

U+ (28), PS-3(2) 


Table 1: Overview of experiments summarising which algorithm performed best how 
many times; n/a indicates that no algorithm completed within 15 minutes. 


Instead, small variations in time bound may lead to great savings in runtime for 
Unif+. This is rooted in the error calculated while running the algorithm coinci¬ 
dentally falling into the allowed margin. Less extreme examples of this behaviour 
are included in Figure top row and Figure [^bottom middle. We observed such 
time savings only for Unif’*', not for any other algorithm, though conceptually 
the runtime of AdaptStep might profit from similar effects as well. The exact 
conditions of this behaviour are still to be found. 

A complete list of model files, additional statistics, result tables as well as all 
prototype implementations are available at the following URL: 

http://depend.cs.uni-saarland.de/~hahate/atval5/ 


Evaluation and Discussion. The results presented show that a general an¬ 
swer about the relative performance of the proposed algorithms is not easy to 
give, but appears very much dependent on model parameters outside the aware¬ 
ness of the modeller. Thus there is no clear winner across all models. Still, our 
benchmarking, summarised in Table provides some general insights: 

— All algorithms are naturally sensitive to increases in model parameters. Their 
runtime mostly behaves linear in the time bounds and the state space size, 
exponential in precision and superlinear (though still polynomial) in fanout. 

— For early schedulers ExpStep- 1 is not competitive. Unif+ mostly outper¬ 
forms ExpStep-2. 

— For late schedulers PolyStep-1 is not competitive and PolyStep-3 is ef¬ 
fectively faster than PolyStep-2. AdaptStep and Unif+ mostly outper¬ 
form PolyStep-3. Still each of the late algorithms {AdaptStep, Unif+, 
PolyStep-3} is dominating the other two on at least one model instance. 
The particular algorithmic strengths have no obvious relation to model pa¬ 
rameters available to the modeller. 

— For low precision, Unif’*' appears to be the preferred choice. For high preci¬ 
sion, AdaptStep is a more stable choice than Unif+. Yet its performance 
depends on non-obvious model particularities and algorithm parameters. 

All in all, Unif+ is easy to implement for both early and late, and competitive 
across a wide range of models. In settings where an a posteriori error bound is 






enough, a good approximation can be usually obtained by a variant of Unif+ 
that computes only the first iteration and does not increase the uniformisation 
rate (see the accompanying web for the error bounds obtained in experiments). 

6 Conclusion 

This paper has introduced Unif+, a new and simple algorithm for time-bounded 
reachability objectives in CTMDPs. We studied this and all other published al¬ 
gorithms in an extensive comparative evaluation for both early and late schedul¬ 
ing. In general, Unif’*' performs very well across the benchmarks, apart from late 
scheduling and high precision, where it appears hard to predict which of the algo¬ 
rithms Unif+, AdaptStep, PolyStep-3 performs best. One might consider to 
follow an approach inspired by the distributed concurrent solver in Gurobi [13]. 
The idea is to launch all three implementations to run concurrently on distinct 
cores and report the result as soon as the first one terminates. 

For researchers who want to extend an existing CTMC model checker to 
a CTMDP model checker, the obvious choice is the UNiF+-algorithm: It works 
right away for early and for late optimisation, and it requires only a small change 
to the uniformisation subroutine used at the core of CTMC model checking. 
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A Proofs from Section |3] 


We first prove the following auxiliary lemma characterizing the functions that 
are used to approximate the lower and upper bounds. 


Lemma A.l. For every 0 ^ k < N and s G S, we have 
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Yfc(s) = sup 
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Wfc(s) = sup Pr®“ [0<jv-iG 1 s@k] = 

sup Pr®” [O^Af-l 

G 1 s@k] 

gG Unt 


gG Tim'^ 


where is the initial state, s@k are the runs that visit s after k steps and 


not reach G before k steps, and O^at-iG are the runs that reach G within N —1 
steps taken in arbitrary time. 

Proof. Let us hrst assume s G G. We have Vfc(s) = Yj.(s) = and 

Wfe(s) = 1 which is also equal to the right hand sides of the equalities above. Next, 
we prove the equalities for s ^ G by induction. First, the first and only summand 
in the right hand sides above equals to 0 while also vjv-i(s) = Vjv_i(s) = 0. 
Next, let fc < fV — 1 assuming the equalities above for fc + 1. 


Yfe(s) =max^PY(s,a,s') • Yfc+i(s') 
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since the first summand equals to zero. Similarly for Wk{s), we have 
Wfc(s) =maxy^P^(s,a, s')wfe+i(s') 

s' 

= maxy^P^(s,a, s') sup Pr®‘" [O^at-iG | s'@A: + 1] 

“ , (tG Unt 
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= sup Pr®'" [O^at-iG I s@k] 
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and the same hold when ranging over schedulers in Tim^. Finally, for Vfc(s) 
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and again the same hold when ranging over schedulers in Timy. □ 

Lemma It holds that vaP < val^, and for any CTMDP Cj, yal < val^ < ^1. 

Proof. vaP < val^ follows directly from the fact that Tirrif, C Tim£. yal < toI 
since 


sup 
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(optimizing for each subset separately yields higher value). 

Furthermore Unt C Time implies yal < vaP, and for each i G No, it follows from 
Lemma ED that 
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LemmaIn any CTMDP C^, Hvq — y|d||oo < s ■ k and ||vo — val||oo < s ■ k. 


Proof. The proof follows from Lemma A.l It is completed by the observation [3] 
that the probability of ^ IV steps to be taken within T is ^ e ■ k. This is 
independent of the scheduler and hence, for both Vq and vq we obtain the desired 
error bound. □ 


Lemma 


We have lim g\ ^ 0 where gx denotes the gap ||yal — val||oo hr 


X—¥Q 


X • 


Proof. From Lemmata and we have that for any A, yal ^ val^ ^ val. It 
remains to show that for any state s G S, we have 


lim |yal(s) — val(s)| —>■ 0. 

A—>-oo 


( 1 ) 















Let Ao be the maximal exit rate in C and let e > 0. We need to find a uni- 
formisation rate A = fcAp such that |val(s) — to1(s)| ^ e. Consider Chebyshev 
inequality 

PrWiixT - AT| ^ ma] < ^ 

where a is standard deviation of V'at and m > 0. Let ^ = |. Then Chebyshev 
inequality for ip\T can be written as 

Pril-ipxT - AT| ^ ^ I 

or 

Pr[^XT e (AT - XT + > 1 - I 

For any uniformisation rate A = fc • Aq, we define ax = [XT — ^y6XT/e\ and 
bx = [XT + ^QXT/e + 1]. Then, 
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In the uniformised model C^.Xg j the probability of not changing state in one step 
is 

A — E{s, a) ^ A — Ao k — 1 
A ^ A ^ k 

Therefore, for cx = bx — ax ^ 2{x/6XTje + 1), the probability px of not changing 
state at all within cx steps satisfies 
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Instead of 0 we prove that there is A such that 

|vo^(s) - Vo^(s)| ^ s/2. 
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where (s), (s), and (s) is defined for all k as v, w, and v, only replacing 

N hy bx. This suffices, as from proof of Lemma|^ the approximations with upper 
bound b\ are only more precise than approximations with upper bound N. Using 
(|^, we fix A to be such that px^ 1 — s/6. Then, we have 

(s) > (s) 

and from ([^, we can obtain by straightforward induction 

where Uf. is defined as y^. except for goal states having value 1 instead of the 
sum of poisson probabilities. The term (s) now equals by definition the term 
and hence 

= - I ^ E 

i=ax 


Furthermore, from the fact that px ^ 1 —e/6, we obtain that each ^ 
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and finally, we get by definition 
= vo^(s) - e/2. 
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Let I?(A) denote the set of all distributions over a discrete set A. Then 

Definition 5. A stochastic update scheduler a on a CTMDP C = (S', Act, R) is 
a tuple a — (At, CTm, tto), where 

- M. is a countable set of memory elements 

- cr„ : At X S X IRj.0 tA(At, Act) is the update function 

- ttq : S I—I?(At) distribution over initial memory values 

The system operates under a stochastic update scheduler as follows. At first 
initial memory values are sampled from the distribution 7ro(s). Afterwards, given 
current memory value, current state and time spent in the state so far (not 
the time from the beginning of the process), the stochastic update function cr^ 
continuously updates the memory value and the action to be taken. When the 
system decides to leave the state, upon entering the successor state the memory 
is also updated by one. 


Lemmaj^ The values {yk)o<k<N computed by Algorithm]^ for given C, V, and 
£ > 0 yield a stochastic update scheduler that is £-optimal in C. 

Proof. Let C = {S, Act, be the original CTMDP, A - the uniformization rate 
computed by Algorithm]^ = {S\, Act, Ra) - CTMDP uniformised with rate 
A. Computation of the lower bound Vq involves as well computation of the e- 
optimal scheduler that attains the bound. Let : 5 'a x N>^^o be 

this scheduler and iC^nt = *Ry„t) - the CTMC induced by where 

iSunt = 5'a X No- Then, 


R(s', (Ty„((s', m), s") if Si = (s', m) and S 2 = (s", m + 1) 


iRj/„t(si, S 2 ) = <( A - E{s',a^.^^{s',m)) if si = (s',m) and S 2 = (s',m + 1) 


otherwise 


Let 7r(^s,m) (s', m+k, t) be the transient probability in for state (s, m+k), 
given that the system starts from state (s,m). Then 


{ „ —At A—Eo)---{X — Et,_2)(X—Ek-i) Mk gt _ g 

^-At (A-i;o)-(A-g.^- 2 )RC,^fc-i.«') ^fc otherwise 

where = ^Jfnt(s, R'- + *)) ^’^el Ei = E{s, ai). 

W.l.o.g. we assume that after N transitions have been performed the sched¬ 
uler takes the same decision for every state, irrespectively of the mem¬ 

ory value. We denote this decision as aAr(s). We now define the finite memory 
stochastic update scheduler a Tim = (A^, o-„, tto): 

- M = [0..A] UT 
- Vto, m' g A4,m,^ E, s G S, t G 


crj(T, s, t) := [(T, aAr(s)) 1 —)■ 1, otherwise 1 —)■ 0] 


7r(g „j)(s, m', t) if m'G [m-I-1..A] and 

a = o^ant(s>™^)-’^Lere 
m® = m, mf = m' 


a^(m,s,t)(m',a) := < 


00 

S '^(s,m) (s, JV + k, t) if m' = E and 

fc=i ’ __ 

V = e and a = alf.^f(s, m) or 

V = £ and a = aiy(s) 


0 


otherwise 









- Vs G S' 7ro(s) := [0 !-)■ 1, otherwise i-G 0] 

Intuitively, when the system moves to a state s and memory m has been 
collected up until this moment, it updates the memory according to the sub¬ 
process C'^ (s, m) of while residing in s and the memory update is finished 
when C decides to leave s. This sub-process Cj{s,m) is a process that starts 
when moves to the state s and evolves when the uniformized system takes 
introduced high-rate transitions. Thus, the value of the memory is equivalent 
to the length of the history of the uniformized process, i.e. the arim simulates 
evolution of the uniformized process and takes exactly the decisions that aunt 
would take. Thus, the transient distribution of the processes induced by arim 
and aunt are exactly the same. 

The amount of memory used by a Tim is in 0(A^|S'|). 



